Skip to content

feat(cloud-agent): add Kilo SDK session facade#3671

Merged
eshurakov merged 6 commits into
mainfrom
equal-newsstand
Jun 3, 2026
Merged

feat(cloud-agent): add Kilo SDK session facade#3671
eshurakov merged 6 commits into
mainfrom
equal-newsstand

Conversation

@eshurakov
Copy link
Copy Markdown
Contributor

@eshurakov eshurakov commented Jun 2, 2026

Summary

Why

Cloud Agent sessions need a stable, authenticated SDK surface so external clients can attach without depending on sandbox topology or exposing internal wrapper details. The facade gives @kilocode/sdk/v2 consumers a narrow public contract for owned root sessions while preserving durable admission, runtime fencing, and cold-history access when a wrapper is unavailable.

What was done

  • Add an authenticated /kilo facade backed by a per-user UserKiloFacade Durable Object. It supports owned root-session listing, detail and message reads, async text prompts with optional Kilo agent/model selection, abort, global SSE, and session-scoped SSE.
  • Keep reads live-first through the sandbox wrapper /kilo-proxy path, with a persisted session-ingest fallback for session detail and transcript reads. Cold transcript pages use native cursors, bounded materialization, and omission metadata for oversized or forward-compatible unsupported items.
  • Route prompt_async and abort through CloudAgentSession instead of directly to the wrapper so admission and interruption remain durable across cold runtimes.
  • Add a fenced wrapper global-feed producer path: the wrapper converts runtime SSE into an authenticated WebSocket feed, and the facade filters, virtualizes, and fans events back out as public SSE.
  • Keep the public projection intentionally narrow: child sessions remain hidden, unsupported SDK routes return 501, typed outer routing fields are virtualized, and nested owner-visible payload content is preserved.

High-level SDK facade architecture

sequenceDiagram
  autonumber
  participant SDK as @kilocode/sdk/v2 client
  participant Worker as cloud-agent-next Worker
  participant Facade as UserKiloFacade DO
  participant SessionDO as CloudAgentSession DO
  participant Wrapper as sandbox wrapper
  participant Runtime as in-process Kilo SDK server
  participant Ingest as session-ingest / SessionIngestDO

  SDK->>Worker: Authenticated /kilo/* request
  Worker->>Facade: Route with authenticated user context

  rect rgb(235, 245, 255)
    Note over SDK,Ingest: Live-first session detail and transcript reads
    Facade->>Wrapper: GET /kilo-proxy/session/:id[/message]
    Wrapper->>Runtime: Forward SDK read
    alt Runtime is available
      Runtime-->>Wrapper: SDK-compatible response
      Wrapper-->>Facade: Return live response
    else Runtime is unavailable
      Facade->>Ingest: Read persisted snapshot or transcript page
      Ingest-->>Facade: Bounded cold projection with native cursor metadata
    end
    Facade-->>SDK: Projected public response via Worker
  end

  rect rgb(245, 245, 235)
    Note over SDK,Wrapper: Durable mutations
    SDK->>Facade: POST prompt_async or abort via Worker
    Facade->>SessionDO: Admit prompt or interrupt execution
    opt Runtime is ready
      SessionDO->>Wrapper: Dispatch admitted work
    end
    Facade-->>SDK: Accepted or idempotent response via Worker
  end

  rect rgb(240, 250, 240)
    Note over SDK,Runtime: Fenced public event stream
    Runtime-->>Wrapper: /global/event SSE
    Wrapper->>Worker: Authenticated producer WebSocket upgrade
    Worker->>SessionDO: Validate producer identity
    SessionDO-->>Worker: Producer accepted
    Worker->>Facade: Forward producer socket
    Wrapper-->>Worker: Runtime event frames
    Worker-->>Facade: Forward event frames
    Facade-->>SDK: Filtered and virtualized public SSE
  end
Loading

Architecture decision

Decision: expose Cloud Agent sessions through a narrow authenticated /kilo facade that implements the relevant @kilocode/sdk/v2 contract, rather than introducing a Cloud Agent-specific client API.

Context: existing UI clients already integrate with the Kilo SDK for session reads, conversation flows, async prompts, aborts, and event consumption. Cloud Agents provide similar capabilities, but their internal lifecycle differs: sessions can move between live sandbox runtimes and persisted history, mutations need durable admission, and wrapper topology must remain private.

Rationale: implementing the relevant Kilo SDK surface gives those clients a familiar integration path with minimal additional client-side work. The facade adapts the Cloud Agent lifecycle behind that contract: reads are live-first with bounded persisted fallback, prompts and aborts continue through CloudAgentSession, and public events are filtered and virtualized before reaching SDK consumers. The initial facade intentionally supports only the subset needed by external clients; unsupported SDK routes return 501 rather than implying broader compatibility.

Alternatives considered:

  • Create a Cloud Agent-specific client API. This could mirror the internal Cloud Agent lifecycle more directly, but UI clients would need a second integration for substantially the same conversation flows. It would also create another protocol to document, maintain, and evolve alongside the Kilo SDK.
  • Expose sandbox wrappers or runtime SDK servers directly. This would reduce facade code, but it would couple clients to sandbox topology and runtime availability. It would also bypass the stable ownership boundary, durable mutation path, persisted-history fallback, and public event filtering provided by the facade.

Consequences: Cloud Agents gain an adaptation layer that must preserve SDK-compatible behavior across live and cold sessions. In return, existing Kilo SDK clients can integrate with Cloud Agent-backed sessions without adopting a parallel protocol, while the internal runtime remains private and independently evolvable.

Verification

  • Exercised the authenticated facade locally through @kilocode/sdk/v2, including session attach and reads, async chat, abort, and event consumption.
  • Built a quick VS Code extension proof of concept against the facade and verified that an extension can attach to a Cloud Agent-backed SDK session and use the conversation flow.

Visual Changes

N/A

Reviewer Notes

  • Public SDK access is limited to user-owned Cloud Agent-backed root sessions with current organization access. Child sessions remain hidden.
  • Cold transcript pages use an 8 MiB aggregate budget, a page size up to 100, and at most 128 persisted message-row scans. Oversized rows are skipped internally; if a native cursor cannot safely represent progress, the facade returns 413 rather than an ambiguous continuation.
  • Global-feed producer fencing is connection-oriented: identity is validated at WebSocket upgrade and facade acceptance time, and newer wrapper connections replace older sockets. Frames are not revalidated against CloudAgentSession after socket acceptance.
  • Deploy session-ingest before cloud-agent-next; the facade consumes additive session-ingest RPCs. The cloud-agent-next deployment must include the Wrangler v5 migration for the SQLite-backed UserKiloFacade Durable Object.
  • Existing warm wrappers do not expose the new /kilo-proxy or global-feed producer paths. Recycle or drain wrappers during rollout so live SDK reads and SSE delivery consistently use the new wrapper behavior.

Comment thread services/cloud-agent-next/src/kilo-facade/user-kilo-facade.ts
Comment thread services/cloud-agent-next/src/kilo-facade/user-kilo-facade.ts Outdated
@kilo-code-bot
Copy link
Copy Markdown
Contributor

kilo-code-bot Bot commented Jun 2, 2026

Code Review Summary

Status: No Issues Found | Recommendation: Merge

Executive Summary

Incremental review of commit 9cd84424d (the only change since the last pass): assertProjectedWakeEvent in the SDK E2E test was simplified to check sessionID and info.role instead of delegating to assertProjectedAssistant, and the two call-site assertSafeProjection invocations for wake events were removed. The README was updated in the same PR to note that private-path fixture variants are excluded from this E2E file and covered by unit/Workers-runtime fixtures. No new bugs, security issues, or logic errors introduced.

Resolved Issues (cumulative)
File Line Resolution
services/cloud-agent-next/src/kilo-facade/user-kilo-facade.ts ~286 FIXED: validateKiloGlobalFeedProducer now called once at HTTP upgrade time, not per producer WebSocket message
services/cloud-agent-next/src/kilo-facade/user-kilo-facade.ts ~697 FIXED: readRequestJson returns ReadRequestJsonResult discriminated union
services/session-ingest/src/dos/kilo-sdk-materialization.ts 824 FIXED: z.object({ id: z.string() }) schema no longer re-instantiated on every sort call
services/cloud-agent-next/test/e2e/sdk-basic-chat.ts N/A FIXED: Wake-event E2E assertions now validate sessionID and info.role correctly; private-path coverage moved to unit/Workers-runtime fixtures per README
Incremental Changes Reviewed (this pass)
  • services/cloud-agent-next/test/e2e/sdk-basic-chat.tsassertProjectedWakeEvent simplified: checks sessionID and info.role directly instead of delegating to assertProjectedAssistant; two call-site assertSafeProjection calls removed; README documents the exclusion is intentional
Files Reviewed (cumulative)
  • packages/session-ingest-contracts/src/rpc-contract.ts
  • services/cloud-agent-next/src/kilo-facade/basic-prompt.ts
  • services/cloud-agent-next/src/kilo-facade/cloud-agent-extension-events.ts
  • services/cloud-agent-next/src/kilo-facade/public-sdk-projection.ts
  • services/cloud-agent-next/src/kilo-facade/public-sdk-projection.test.ts
  • services/cloud-agent-next/src/kilo-facade/session-proxy.ts
  • services/cloud-agent-next/src/kilo-facade/user-kilo-facade.ts
  • services/cloud-agent-next/src/kilo-facade/user-kilo-facade.test.ts
  • services/cloud-agent-next/src/persistence/CloudAgentSession.ts
  • services/cloud-agent-next/src/router/handlers/session-execution.ts
  • services/cloud-agent-next/src/router/handlers/session-send.ts
  • services/cloud-agent-next/src/server.ts
  • services/cloud-agent-next/src/server.test.ts
  • services/cloud-agent-next/src/session-ingest-binding.ts
  • services/cloud-agent-next/src/session-service.ts
  • services/cloud-agent-next/src/session/queue-message.ts
  • services/cloud-agent-next/src/session/wrapper-global-feed-validation.ts
  • services/cloud-agent-next/src/shared/http-proxy.ts
  • services/cloud-agent-next/src/shared/http-query.ts
  • services/cloud-agent-next/src/types.ts
  • services/cloud-agent-next/wrapper/src/global-feed-manager.ts
  • services/cloud-agent-next/wrapper/src/global-feed.ts
  • services/cloud-agent-next/wrapper/src/main.ts
  • services/cloud-agent-next/wrapper/src/restore-session.ts
  • services/cloud-agent-next/wrapper/src/server.ts
  • services/cloud-agent-next/wrapper/src/utils.ts
  • services/cloud-agent-next/wrangler.jsonc
  • services/cloud-agent-next/test/integration/kilo-facade-runtime.test.ts
  • services/cloud-agent-next/test/e2e/sdk-basic-chat.ts
  • services/session-ingest/src/dos/SessionIngestDO.ts
  • services/session-ingest/src/dos/kilo-sdk-materialization.ts
  • services/session-ingest/src/session-ingest-rpc.ts
  • services/session-ingest/src/session-ingest-rpc.test.ts
  • services/session-ingest/src/types/session-sync.ts
  • services/session-ingest/src/util/compaction.ts
  • services/session-ingest/test/integration/session-ingest-do.test.ts
  • services/session-ingest/wrangler.test.jsonc

Reviewed by claude-4.6-sonnet-20260217 · 637,738 tokens

Review guidance: REVIEW.md from base branch main

Comment thread services/session-ingest/src/dos/kilo-sdk-materialization.ts Outdated
Comment thread services/cloud-agent-next/src/kilo-facade/user-kilo-facade.ts
Comment thread services/cloud-agent-next/test/e2e/sdk-basic-chat.ts Outdated
Copy link
Copy Markdown
Contributor

@jeanduplessis jeanduplessis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming stale-producer fencing is addressed, overall design looks like a good starting point. Stable /kilo facade keeps sandbox and wrapper topology out of public SDK contract, while existing session DO remains lifecycle authority. That feels like right boundary for shipping quickly without locking us into current runtime implementation.

A few points may be worth considering before external consumers start depending on behavior:

  • Event delivery semantics: current SSE feed appears best-effort, without replay after disconnect. That seems reasonable for UI updates if clients reconcile through session.get() and session.messages() after reconnect. If reliable automation feeds are expected, replay cursors and persisted events may be worth thinking through sooner since that would be more structural.
  • Projection contract: outer routing directory is virtualized, while nested native payload fields may still include sandbox-local values. Preserving those fields is a reasonable v1 choice, but it may help to describe nested payload content as opaque so clients do not build dependencies on runtime topology.
  • Cold-history contract: keeping cursors opaque and omission states explicit should preserve flexibility. It may be useful to avoid documenting storage ordering or cursor encoding as public behavior.

Some technical debt seems acceptable for first version, with a few areas worth watching as usage grows:

  • UserKiloFacade is taking on several coordination responsibilities. That is fine initially, though stable policies such as event projection and producer fencing may become good extraction points later.
  • User-scoped facade DO may become hot for users with many active sessions or subscribers. Metrics around subscriber count, event throughput, and slow-consumer eviction should make it clear whether internal sharding is ever needed.
  • Warm and cold reads have separate implementations. Shared contracts and current tests are a solid base; a small conformance suite could help prevent behavior drift as SDK surface expands.

None of these look like reasons to hold the PR for a broader redesign. Main thing is keeping v1 transport details from becoming accidental public guarantees, so implementation remains easy to evolve after shipping.

@eshurakov eshurakov merged commit 3788e4d into main Jun 3, 2026
58 checks passed
@eshurakov eshurakov deleted the equal-newsstand branch June 3, 2026 13:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants